VideoMAE is a video self-supervised pretraining model based on Masked Autoencoder (MAE), which learns internal video representations through masked patch prediction, suitable for downstream tasks like video classification.
Video Processing
Transformers